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(54) Speech recognition 

(57) Words uttered by a user are compared 2, with stored words 3; the word giving the best "score" in 
the comparison is deemed to have been recognised. Where equal or similar scores occur the result is 
ambiguous and in that case a message is generated (eg by means of a speech synthesiser 7) containing a 
word for the user to confirm. If he does not, a second word may similarly be offered. 
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SPECIFICATION 
Speech recognition 

5 The present invention relates to speech recognition systems. In such systems words uttered are 5 
subjected to known pattern recognition techniques and if correspondence with a known word is 
found, suitable coded signals are generated identifying the word. Correspondence is generally 
determined by generating signals or "scores" indicating the degree of similarity with stored 
patterns corresponding to known words; the word having the best score is deemed to be the 
10 word uttered. This technique fails, however, if an ambiguous result is obtained (ie if two scores 10 
are obtained which are the same or differ only by a small amount). Normally in an interactive 
arrangement the remedy is for the recognition system to respond by presenting the user with a* 
request to repeat the word in question. 

: However, this approach suffers from the disadvantage that there is a high probability of the 
15 ambiguity recurring; also it can be irksome for the user. 15 
According to the present invention, therefore, there is provided a speech recognition apparatus 
comprising analysis means for receiving speech signals from a user, comparing each received 
word with stored representations of words to produce similarity signals indicating the degree of 
correspondence between them, and producing coded signals identifying recognised words, and 
20 output meansfor presenting messages to the user, the analysis means being operable in the 20 
event that the similarity signals in respect of a first stored representation to which a received 
word most clbsely corresponds is equal, to, or differs by less than a predetermined margin from, 
the similarity signal in respect of a second stored representation to 

(a) generate via the output means a message including the word represented by the first 

25 stored representation; . 25 

(b) await an indication from the user as to whether the word is correct; 

(c) upon receipt of a positive indication to generate the said coded signal. 

In the event that a negative indication is received from the user, the analysis means may 
generate a message requesting repetition of the word, or may 
30 (i) generate via the output means a message including the word represented by the second 30 
stored representation and 
(ii) await an indication from the user as to whether the word is correct. 
The output means may be a visual display, or could be a speech synthesiser. 
The indication from the user may be input by means of switches or a keypad, but more 
35 preferably is by speaking appropriate words (eg "Yes" or "no") which may then be analysed by -35 
the analysis means. 

One embodiment of the invention will now be described by way of example, with reference; to 
the accompanying drawing which is a block diagram of a speech recognition apparatus. 
In the figure, speech from a user is received- by a microphone 1 connected to a speech 

40 recogniser 2. The recogniser compares received words with the contents of a pattern store 3 40 
which contains representations of a repertoire of words which it is desired to recognise. 

Any of a number of conventional recognition algorithms may be used and these will not 
therefore be'discussed in detail. By way of example the "VOTAN" recogniser card produced by 
Votan Inc. for use with an IBM PC microcomputer might be employed. 

45 . The. recogniser 2 compares a received word with each of the stored representations- and 45 
produces for each a similarity signal or "score" which indicates the closeness of fit between the 
two. Normally the word whose stored representation has the best score is the one "recog- 
nised" and a corresponding coded signal is passed via line 4 to a control unit 5, which could ■ 
for example be the aforementioned IBM computer, for onward transmission or to initiate, further 

50 action, according to the purpose of the system. 50 
If, however, two stored representations have the same or similar scores, the result is .ambi- 
guous and a signal indicating this is passed via line 6, along with codes for both words via line 
4, to the control unit 5 which responds by enerating a message back to the user via a speech 
synthesiser 7 "and loudspeaker 8. 

55 This message has the form of "Did you say X", where X is the word whose representation 55 
stored in the pattern store gave rise to the better score (or, if the two scores were identical, 
one of the two selected at random); and awaits a reply. The synthesiser is assumed to have a 
parameter store 9 to enable it to generate appropriate words. 
If the user replies "Yes" (or "No") this is recognised by the recogniser 2 and signalled to the 

60 control unit 5 which, in the event of a Yes proceeds as if X had. been identified originally. In the 60 
event of a "No", a further message is issued via the synthesiser, viz "Did you say Y". Again 
the user response is analysed and if. Y is confirmed, recognition is deemed complete-. If the user 
again replies "No", the control unit then initiates generation of a request for repetition (although 
in principle of course the third choice could be offered). 

65 By way of exa m pie , 1/26/ 05 , u ' EA ST Version :'2?0. 1 I 4 ht be ln a te,e P none banking 65 
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service. Here, the control unit would be programmed to generate questions to. the user, via the 
speech synthesiser, and. to respond by generating further questions to elicit the required informa- 
tion to assemble an instruction which may then be passed to a bank's staff or computer, for 
effecting a credit transfer, printing a statement, or the like. 

A typical set of words representations of which might be included in the pattern store 3 might 
be 



10 



15 



20 



25 



Services: 
"Statement" . 
"Balance" 
"Mini-statement^ 
"Transfer" 
"Cheque-book" '• 
"Help- 
Account Types: 
"Current: 
"Savings" 
"One" 
"Two" 

Amounts: 
"Ten" 
"Twenty" 
"Thirty" 
"Forty" 
"Fifty" 
"Full" 



-Order full statement 

-Give Balance 

-Last 4 transactions 

-Transfer money between accounts 

-Order new cheque-book 

-Request assistance from bank staff 



Bank accounts 
Credit card accounts 



) Amount, in pounds 
) 

). 

-Make full, payment 



30 Cancel: 
"Stop" 



-Cancel service.(may be used during speech. output) 



A typical user-machine dialogue might proceed as follows (after an entry procedure with 
appropriate identity numbers and/or passwords-possibly accompanied by speaker recognition 
35 techniques for added security): 

"Which service do you require?" 
"Transfer" (mispronounced) . 
"Did you say 'statement"' 
40 (4) User: "No";' . 

"Did you say 'transfer"' ■ 
"Yes"- 

"From which account do you wish to transfer funds?" 
'-Savings".' ■ 
45 (9) System: ."Which account do you. wish to transfer funds to?" 
"Current" (mispronounced)/ 
"Did you say 'Current'?" 
"Yes" 

"How muchmoney (in pounds) do you wish to transfer from 
50 your savings account to your current account?" 

"Ten*' (mispronounced) 
"Did- you say* 'twenty'?" . 
"No" 

"Did you say 'thirty?" 
55 (18) User: "No" ■ . . 

"How much money (in pounds) do you wish to transfer from 
your savings account to your current account?" 
"Teh" 

"Ten . pounds will' be transferred from your savings account 
60 to your current account. Do you. require another service?" 

Note the statements at lines 2, 10 and 14 where- poor pronunciation, noise or the like has 
given rise to an ambiguity which has been resolved in two cases by offering to the user the 
words judged to be closest tl/26/05 , EAST Version : 2.0.1.4 
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CLAIMS " 

1 . A speech recognition apparatus comprising analysis means for receiving speech signals 
from a user, comparing each received word with stored representations of words to produce 
similarity signals indicating the degree of correspondence between them, and producing coded 

5 signals identifying recognised words, and output means for presenting messages to the user, the . 5 
analysis means being operable in the event that the similarity signals in respect of a first stored 
representation to which a received word most closely correspondis is equal to, or differs by less 
than a predetermined margin from ; the similarity signal in respect of a second stored representa- 
tion to 

10 (a) generate via the output means a message including the word represented by the. first 10 
stored representation; 

(b) await an indication from the user as to whether the word is correct; 

(c) upon receipt of a positive indication to produce the said coded signal. 

2. An apparatus according to claim 1 in which the analysis means is arranged, upon receipt 

1 5 of a negative indication to .(i) generate via the output means a message including the word 1 5 

represented by the second stored representation and (ii) await an indication from the user as to 
whether the word is correct. 

3. An apparatus according to claim 2 in which the output means is a speech synthesiser. 

4. A speech recognition apparatus substantially as herein described with reference to the 

20 accompanying drawing, : 20 
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